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PAK 1: 


SOFTWARE 


An 8-/16-bit successor to the 6502 


WITH SEMICONDUCTOR memory 
prices falling, microcomputer systems 
are boldly going where no micros have 
gone before. Systems with 64K-byte 
main memories are common, and those 
with a megabyte or more are just 
around the corner. Since such old 8-bit 
standbys as the 6502, 6800, and Z80 
can directly address only 64K bytes, 
their days as the backbone of general- 
purpose systems may be numbered. In- 
deed, the increasing number of per- 
sonal computers, like Apple and Atari, 
using bank switching and disk simula- 
tion to expand their semiconductor 
memory access shows the growing 
severity of the address-space pinch. 
Designed to relieve the constraints of 
limited memory,.a new branch of the 


6502 family directly addresses up to 16 
megabytes while bringing the simplici- 
ty and execution speed of this time- 
tested microprocessor to 16-bit systems. 
The main representative of this new 
group is the W65SC816, which | refer 
to hereafter as the 65816. In emulation 
mode, it is pin- and software-compatible 
with the 6502, and toggling a single flag 
bit converts it to a complete 16-bit pro- 
cessor. In part | of this article | cover 
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W65SC816 PROCESSOR PROGRAMMING MODEL 
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Figure |: The programming model for the 65816 shows that the 6502's 8-bit registers 
have been expanded. the program counter extended, unused bits filled, and a new register 
(the direct register) added.- 
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some of the software consideratione 
this processor. Next month, part 24 
address some hardware Consideratigy 
Throughout, the term “page” refe 
a 256-byte memory area wherein ing 
vidual location addresses differ inon 
the low-order 8 bits, just as for the 6593 
“Bank” refers to a 64K-byte memon 
area with address locations that diffe 
in the low-order 16 bits, with the uppe 
8 bits identical. : 


OVERVIEW 


Figure | shows the programming model 
for the 65816. Close examination re © 
veals the 6502 registers with extensions, 
plus one additional 16-bit register 
formerly not present (the direct register} 
and two 8-bit bank registers. The 8-bit © 
registers of the 6502 have been ex. © 
tended to 16 bits, the 6502's 16-bit pro- 
gram counter has been effectively ex. 
tended to 24 bits, and the previously © 
unused bits in the status register have 7 
been filled (with an extra bit added). The © 
accumulator and index registers still can 
be treated optionally as 8-bit registers 
by setting the appropriate status regis- 
ter bits. Except for the accumulator, the 
full 16-bit registers retain the original 
8-bit names. Each half of each register 
uses the same Original name, and they 
are distinguished from one another by 
an “H" or “L’ suffix for the high- and 
low-order bytes, respectively. The 16-bit 
accumulator is newly designated as “C” 
The high- and low-order bytes of the ac- 
cumulator are referred to as “B” and 
“A 
The six status register flags matching 
those of the 6502 retain the same 
meanings. The seventh 6502 flag, “B,’ 
(continued on page 385) 
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E = | (6502 Emulation) 






OOFFFE.F—IRQ/BRK Hardware/Software 
OOFFFC.D—RESET Hardware 
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E=0 (16-Bit Operation) 





OOFFFE.F—IRQ Hardware 









OOFFFC.D—RESET Hardware 
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Figure 2: When the 65816 is not 
emulating a 6502. it has a separate vector 
that defines the software interrupt's location 
and relieves the routine from having to 
determine which type of interrupt invoked it. 
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continued from page 126) 
js now “X." On the 6502 (and on the 
65816 in its 6502 emulation), X indi- 
cates a BRK (Break) instruction. A soft- 
ware interrupt, it transfers control to the 
F same routine as does the hardware in- 
" terrupt. IRQ (masked by flag “'I). When 
not emulating a 6502, the 65816 adds 
a separate vector (see figure 2) that de- 
fines the software interrupt routine's 
jocation. This vector relieves the routine 
© from having to determine which type of 
~ interrupt invoked it. 
The new “M" flag, as well as X, is used 
_ when the 65816 is not emulating a 
© 6502. They both determine whether 
snerations treat the accumulator and 
index registers as 8- or 16-bit entities. 
The M flag. when set to “I,” indicates 
"that the accumulator is to be treated as 
» gn 8-bit register: when set to 0, the ac- 
¢umulator is 16 bits long. The X flag has 
same effect on the index (X and Y) 
pgisters. 
The added “E” flag determines 
whether the 65816 emulates a 6502 or 
tts as a 16-bit processor. Since the 
is register is defined as having only 
bits, the E flag is conceptually an ex- 
nsion of the C flag. While E cannot be 
ly tested, set. or cleared, XCE (ex- 
ange C and E flags) permits swapping 
with the C flag. which can. 
The additional 16-bit register, called 
he “direct register’’ determines place- 
ent of the “direct page” in bank zero. 
on the 6502. one special page in 
mory is designated for the most fre- 
mntly used or most time-critical por- 
hs of a program. Because the address 
thin this page can be completely 
ecified in a single byte, instructions 
hbe shorter. This arrangement results 
maller programs, and, since the pro- 
or fetches one less byte per instruc- 
xecution is faster. On the 6502, 
page (also called “zero page’) 
es permanently at the beginning of 
Ory: that is, the high-order byte of 
ecdress is 0. On the 65816, the 
Rt page can appear anywhere in 
} Zero, even across normal page 
Uaries. This feature permits more 
ity in context switching and shar- 
ta between subroutines or co- 
es. Also notable. if the low-order 
BF the direct register (DL) is non- 
Me 65816 needs an extra cycle for 
"struction that has any form of 
Page addressing. The additional 
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cycle is used to add DL to an offset. If 
DL is zero, the 65816 has special cir- 
cuitry to skip that cycle. Except in spe- 
cial cases, therefore, the direct Page 
should begin on a page boundary. The 
direct register can be accessed by trans- 
ferring data to or from either the C ac- 
cumulator or the stack. 

The two new 8-bit bank registers give 
the 65816 its ability to access a full 16 
megabytes of memory. As shown in 
figure 1, the program bank register 
(PBR) is an extension of the program 
counter. A program can read the reg- 
ister's contents only after pushing them 
onto the stack and then loading them 
into another processor register. PBR is 
modified by “long” forms of the jump 
and jump-to-subroutine instructions. Al- 
though its contents can be pushed onto 
the stack. PBR cannot be loaded except 
when it is also loading the program 
counter. If PBR were ever loaded inde- 
pendently, the next instruction after the 
one that loaded it would be fetched 
from the “following” location—but in a 
different bank. This unfamiliar instruc- 
tion, in turn, would most likely lead the 
processor off into never-never land. 

In some addressing modes, such as 
direct page addressing, the data bank 
registers explicitly specify the bank 
number. In any case, where the bank 
number is not laid down precisely, the 
data bank registers assign one. Strictly 
speaking. they are not extensions of the 
index registers, except in the sense that 
they help form those addresses that use 
the indexed addressing modes. The 
data bank registers are accessed 
through the stack and can be modified 
by the block move instructions de- 
scribed next. 


NEW INSTRUCTIONS 


In “The CMOS 6502” (December 1983 
BYTE, page 443), | pointed out that the 
newer version added a number of in- 
structions to those of the NNYOS (nega- 
tive-channel metal-oxide semiconduc- 
tor) 6502. Those instructions filled gaps 
in the instruction set and rounded it out 
for the 6502. 

The 65816 goes a step further by add- 
ing some nice-to-have instructions plus 
some others needed for the change to 
16 bits and the addition of new regis- 
ters. The op (operation) codes corres- 
ponding to hexadecimal codes 7x and 
Fx (“set and reset individual bits in Page 


zero” and “branch if bit set or reset in 

page zero’) are present only on the 

Rockwell version of the 65C02. Western 

Design Center, the designer of both the 

65C02 and the 65816. does not specify 

them. On the 65816, Western reserves 
the 7x and Fx equivalents for “long” ad- 
dressing modes of existing instructions. 

A description of the long addressing 

modes follows. 

Figure 3 shows the complete instruc- 
tion set of the 65816. The hexadecimal 
code for each op code is given by the 
digit at the left end of the row (most sig- 
nificant digit) and the digit at the top of 
the column (least significant digit). The 
op-code mnemonic is three capital let- 
ters. followed by the addressing mode 
in lowercase. The number at the lower 
left of each box gives the number of 
bytes for the op code plus its oOperand(s) 
(address or data). In the case of the im- 
mediate addressing mode with 16-bit 
data. one additional byte is required 
The number in the lower right of each 
box gives the basic number of clock 
cycles required for the instruction. 

Additional cycles are required for a 
variety Of cases, since this figure is 
merely the number of cycles for execut- 
ing that instruction in the fastest case 
For example, add one cycle for instruc- 
tions using the direct register for part 
of an address if DL is not zero: add a 
cycle for read or write Operations using 
16-bit data; add two cycles for read- 
modify-write instructions, such as shifts, 
using 16-bit data: add one cycle for 
branch instructions (including BRA) 
when the branch is taken, and another 
if the branch crosses page boundaries 
in the 6502 emulation mode: add one 
cycle for indexed addressing modes 
where the indexing causes a page boun- 
dary crossing: and add one cycle for 
REP or SEP (reset or set status register 
flags) instructions using the immediate 
addressing mode. 

The NMOS 6502 allows several types 
of conditional program branches but 
does not include an unconditional form. 
Since this is the only type of position- 
independent program transfer on the 
6502, some programs use a sequence 
such as SEC, BCS (set the carry flag, 
branch if carry set, respectively) to 
achieve an unconditional branch, The 
65816 instruction BRA (branch relative 
always) fills this gap in the instruction 

(continued) 


AUGUST 1984 * BYTE 385 


3 one ee ee gee a srvtad 
a on ee aii « tena teste Ft 8. SE po 
; we ye Me ided es 
beers x . 





LE Teer 





THE 65816 


Table 4. W65SC816 Microprocessor Op Code Matrix 
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Figure 3: The 65816's instruction set with hexadecimal equivalents, mnemonics, addressing- 
operand, and the basic number of clock cycles required for the instruction. 


set. Not only is BRA position indepen- 
dent, it requires | less byte than the 
alternative JMP (unconditional jump) in- 
struction. It is limited to destinations 
within 126 bytes before and 129 bytes 
after the branch instruction, but this 
range is frequently wide enough for 
loops within a program. 

Because the 6502 has a limited num- 
ber of registers, the index registers are 
frequently turned to general-purpose 
uses. Since there are no instructions to 
directly transfer data between index 
registers and the stack or between each 
other, the accumulator sometimes is 
used to mediate transfers. The se- 
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quence PHA (push the A register on the 
stack), TXA (transfer the contents of the 
X register to A), PHA, TYA (transfer the 
contents of Y to A), PHA is common in 
subroutines that must preserve all pro- 
cessor registers for the calling routine. 
The sequence for restoring the registers 
is equally clumsy. Also, the data in the 
A register is destroyed, and it is tricky 
for the subroutine to recover this data 
from the stack without destroying data 
for the index registers. The 65816 adds 
the instructions PHX, PHY (push index 

registers to stack), PLX, PLY (pull index 

registers from stack), TXY, and TYX 

(transfer X to Y and Y to X, respective- 





mode number of bytes for the op code and 


ly). This sequence lets a subroutine save 
just those registers it needs to use and 
restore, or save them in a different se- 
quence on the stack so that the data is 
available later in the correct sequence. 
The 6502 has found applications both 
in general-purpose systems and in spe- 
cial applications, such as printer con- 
trollers. Most general-purpose applica- 
tions don't often need the ability to set 
or reset specific bits of memory or in- 
put/output ports. Such dedicated appli- 
cations as printer or appliance con- 
trollers, on the other hand. frequently 
rely heavily on this ability. 
Multiple-processor systems need the 
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ability to set and clear semaphores to 
arbitrate the use of resources such as 
memory and peripherals. F urthermore, 
they must be able to do it in such a way 
as to change a bit and then have access 
to its previous state. The requirement is 
made even more demanding by the 
stipulation that no other Processor can 
also get to the semaphore during such 
access. The TRB (test and reset bits) and 
TSB (test and set bits) instructions pro- 
vide this capability. Bit 7 of the ad- 
dressed memory location is copied to 
the C flag, and bit 6 is transferred to the 
V flag. TRB then computes the logical 
AND of the memory data and the in- 
verse Of the accumulator data, while 
TSB computes the logical OR of the 
memory data and the accumulator. 
Either instruction then stores the result 
back in the memory location and sets 
the Z flag to indicate whether or not the 
result was 0. The effect is to set or Clear 


_ the bits corresponding to “1” bits in the 


accumulator, while loading the N and 
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V flags with the previous state of bits 
6 and 7 of memory. Two similar instruc- 
tions, REP and SEP use a byte of im- 
mediate data to determine which status 
register bits to change. Using immediate 
data is more economical than adding 
distinct instructions for controlling each 
such bit. There is still no instruction for 
complementing any of the flags. 

Eight new instructions permit new 
types of stack accesses. Five of the eight 
transfer registers to and from the stack: 
PHB and PLB for the data bank register, 
PHD and PLD for the direct register, and 
PHK for the program bank register, 
Again, there is no instructién to pull just 
the program bank register from the 
stack. ' 

The three other new stack instructions 
permit pushing computed data onto the 
stack and then later pulling it back into 
one of the processor registers or ac- 
cessing it through new stack-relative ad- 
dressing modes. The simplest of the 
three is PEA (push effective absolute 
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address). This instruction Pushes the 
third byte and second byte of the in- 
struction, in that order, onto the stack. 
The name implies that the instruction is 
meant to be used to Pass a data block's 
address to a subroutine. However, it 
might also be used to Pass any fixed 
16-bit data. The effect is the same as 
loading the accumulator with 16-bit data 
and then Pushing the accumulator to 
the stack, except that data in the ac- 
cumulator is left intact and PEA ex- 
ecutes faster. 

The PEI (push effective indirect ad- 
dress) instruction is only slightly more 
complex than the PEA instruction. The 
second byte of the instruction is used 
as an offset into the direct Page 
(pointed to by the direct register), and 
the two bytes of data at that and the 
next-higher memory location are 


Pushed onto the stack. The effect is to 
Save the contents of two memory loca- 
tions in the direct Page, just as if they 

(continued) 
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made up a 16-bit processor register. 

PER (push effective program-counter- 
relative address) passes the address of 
a data block to a subroutine. The data 
is then stored within the calling routine, 
and the address is given as an offset 
from the PER instruction’s location. The 
calling routine treats the second and 
third bytes of the instruction as a 16-bit 
quantity and adds them to the program 
counter. The 16-bit result is the data to 
be pushed on the stack. PEA and PER 
both pass data stored in the calling 
routine’s code. While PEA can be used 
to pass a single 16-bit quantity to a sub- 
routine, PER passes the address of a 
whole block of data. 

Since the 65816 has several more in- 
ternal registers than the 6502, it needs 
some new instructions for transferring 
data inside itself. Transfers between the 
X and Y index registers were covered 
above. The TCD (transfer accumulator 
C to the direct register) and TDC (trans- 
fer D to C) instructions provide the only 
way to set or read the direct register 
without using the stack. The 65816’s TCS 
(transfer accumulator C to stack pointer) 
and TSC (transfer S to C) instructions 
ease the 6502's bottleneck of being 
able to access the stack pointer only 
through the X register. XBA (exchange 
B and A) lets you flip-flop the upper and 
lower 8-bit halves of the accumulator to 
help it handle mixed 8- and 16-bit data. 
The Z and N flags are set during the ex- 
change to indicate the new status of the 
accumulator. The XCE (exchange carry 
flag with emulation flag) is needed be- 
cause the emulation (E) flag does not 
actually belong to the status register 
proper. 

The RTL (return from subroutine— 
long) instruction is needed because a 
new absolute long addressing mode has 
been applied to the JSR (jump to sub- 
routine) instruction. In addition to re- 
storing the program counter, RTL also 
restores the program bank register from 
the stack. In keeping with the conven- 
tion that higher-order bytes appear 
higher in memory, RTL pushes the pro- 
gram bank register before and pulls it 
after the normal program counter. RTI 
(return from interrupt) is always treated 
by the program as a long return. since 
interrupts, by their very nature, must 
specify the full address of the interrupt 
routine, possibly changing the program 
bank register in the process. 
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Several new instructions deal with 
control of the system bus and clocks. 
The COP (coprocessor) instruction sup- 
ports coprocessors and acts like a soft- 
ware interrupt, except with its own vec- 
tor. The STP (stop the clock) instruction 
stops the system clock with the phase 
two clock high. This state can be re- 
leased only by a reset, which then func- 
tions as a high-priority interrupt. with 
the stack set up correctly for the RTI in- 
struction to resume processing with the 
instruction following the STP instruction. 
The WAI (wait for interrupt) instruction 
stops execution and waits for either the 
maskable or nonmaskable interrupt 
(IRQ or NMI). One additional code 
(WDM) is defined as a no-operation in- 
struction, but it is reserved for future 
use. The mnemonic comes from the de- 
sign engineer's initials. 

The new block-move instructions, 
MVP and MVN, are reminiscent of the 
block moves of the Z80 processor, ex- 
cept that they can be used to move a 
block of memory anywhere within the 
full 16-megabyte address space. MVP 
(move preceding) is meant to move data 
higher in memory. The move starts at 
the top end of the block and proceeds 
downward. This procedure avoids over- 
writing data that has yet to be moved 


in cases where the old and new block ° 


locations overlap. 

MVN (move next) is similarly intended 
for moving data. In this case, however, 
data is moved to a position that is lower 
in memory, starting at the low end of 
the block and working upward through 
it. The second byte of the instruction 
gives the bank address of the destina- 
tion memory area and the third byte 
gives the bank address of the old loca- 
tion in memory. The X register contains 
the low-order 16 bits of the first byte's 
current address. The Y register contains 
the low-order 16 bits of the new ad- 
dress. The C accumulator contains the 
number of to-be-moved bytes minus 
one. Each byte moved requires seven 
clock cycles, and the move can be in- 
terrupted and resumed through the nor- 
mal interrupt mechanisms. After com- 
pletion, the data bank register contains 
the destination memory area's bank ad- 
dress. This is an important considera- 
tion since the program will normally 
need to operate on the data after mov- 
ing it. The X and Y registers are in- 
cremented or decremented as the move 


proceeds. As a result, the Y register wil] 
hold the 16 low-order address bits of 
the next byte beyond (above or below) 
the new area. The A register contains 
hexadecimal FFFF. The 65816 boasts 
versions suitable for clock rates from | 
to 10 MHz. This capacity allows a sys- 
tem using the slowest clock to move a 
complete 64K-byte data block (charac- 
ters to a very high resolution memory- 
mapped video display, for example) in 
just under half a second. With a 10-MHz 
clock, the same transfer would take 
somewhat less time than a complete 
screen refresh (standard monitor with 
interlaced scanning). 


ADDRESSING MODES 


The 65816 includes all the addressing 
modes of the standard 6502 plus a 
number of new ones. The existing ad- 
dressing modes were extended to allow 
for 24-bit addresses and 16-bit data 
where appropriate. Internal operations 
employing implied addresses take the 
data width corresponding to the speci- 
fic registers being used. Most of the 
6502 addressing modes now have sepa- 
rate long addressing modes, allowing 
them to use either a 16-bit address 
within the bank specified by the data 
bank pointer or a 24-bit address within 
the full 16-megabyte address space. 

The immediate mode of the 6502 as- 
sumes that one of the operands is an 
8-bit quantity stored immediately after 
the op code. For operations on 16-bit 
registers, the 65816 lets you store 16-bit 
data in the two bytes following the 
operation. 

The addressing modes—absolute, in- 
direct indexed, and absolute indexed 
with X (but not Y)—have new long forms 
as well as standard forms. Their stan- 
dard addressing modes specify the low- 
order 16 bits of the address; the high- 
order 8 bits come from the data bank 
register. The zero page designation on 
certain modes has been replaced by 
direct page. The change is consistent 
with this special page's new mobility as 
granted by the direct register but leads 
to such double-talk as “direct indirect 
indexed” for an addressing mode 
description. To translate, keep in mind 
that “direct” always refers to the direct 
register or page. ‘indirect’ refers to the 
memory location(s) holding a further 
address rather than data. and “indexed” 
means adding an index register to the 
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THE 65816 


address formed up to that point. Also, 
the name of the addressing mode 
describes, from left to right, the order 
in which the operations are performed. 
Thus, the example “direct indirect 
indexed” means that the instruction’s 
operand specifies a direct-page location 
(direct), which contains only an address 
(indirect), and to which the Y register is 
added (indexed). Adding the Y register 
forms the data’s final address for that 
particular instruction. Other addressing 
modes use these terms in different se- 
quences and in combination with other, 
more self-explanatory terms. For all 
cases, the same general idea holds. 
The branch, jump, and jump-to-sub- 
routine instructions each retain their old 
6502 addressing modes. The jump and 
jump-to-subroutine instructions also 
have new long addressing modes that 
load both the program counter and the 
program bank register with new values. 
In addition. both jump and jump-to-sub- 
routine have new indexed addressing 
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modes that use the x register as the 
index of a subroutine address table. The 
unconditional branch instruction’s long 
addressing mode specifies a 16-bit dis- 
placement from the Current location. 
This displacement can be positive or 
negative and wraps around within the 
current program bank. Conditional 
branches have no analogous long ad- 
dressing mode. 

Expanded stack addressing modes 
complement the new stack manipula- 
tion instructions. With parameters 
Passed to it on the stack. a subroutine 
can read or modify them as easily as it 
can access those parameters stored in 
a fixed block of memory. It can pop or 
Push parameters as before and can also 
access them without moving the stack 
pointer. All it has to do is specify a 
displacement ranging from 0 to 255 
bytes above the stack pointer. Using the 
“stack relative indexed’ addressing 
mode, a subroutine can use a stack 
Parameter as the table address to be in- 
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dexed by the Y index register. Since the 
Stack pointer can now be transferred 
directly to or from the accumulator. 9 
simple arithmetic operation can allocate 
or delete large stack areas. Combined 
with the long relative branch, the new 
stack-addressing modes herald the 
6502 family’s move toward position- 
independent, reentrant. and even recur- 
Sive routines, 

With its enhanced instruction set, new 
addressing modes. expanded interrupt 
handling, and 24-megabyte address 
space, the 65816 enters the 16-bit arena 
on equal footing with the 8086 and 
68000. Its one handicap is its lateness, 
but that might be offset by the familiar. 
compact, and powerful 6502 instruction 
set that it brings along. Next month | will 
show you some hardware enhance. 
ments that ease the system designer's 
burden when using this 6502 cousin in 
boxes ranging from dedicated con- 
trollers to multiuser. multiprocessor 
systems. @ 
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